STAT 313: Applied Experimental Design and Regression Models

Instructor Contact Information

  • Allison Theobold, Ph.D
  • Email:
  • Office: Building 25 Office 105 (by Statistics Department Office)

Course Info

Room:

  • M, T, R – Constr Innovations Center C202
  • W – Kennedy Library 0111B

Times:

  • Section 01: 8:10-9:00am
  • Section 02: 9:10-10:00am

Office Hours

Day Time
Tuesdays 1:00 pm – 2:30 pm (in-person and / or Discord)
Wednesdays 7 – 8 pm (on Discord)
Thursdays 1:00 pm – 2:30 pm (in-person and / or Discord)

I am available for individual appointments on Tuesdays and Thursdays outside of these time slots, but appointments must be made made at least 24 hours in advance through Calendly, using the following link: https://calendly.com/allisontheobold


Required Materials

For this course we will be using one main textbook, accompanied by additional resources. The textbooks we are using are free, but have the option to obtain a printed copy if you wish.

Textbooks

Çetinkaya-Rundel and Hardin, Introduction to Modern Statistics. https://openintro-ims.netlify.app/

Ismay & Kim, Modern Dive: Statistical Inference via Data Science. https://moderndive.com

Required Technology

R is the statistical software we will be using in this course (https://cran.r-project.org/)

RStudio is the most popular way to interact with the R software. We will be interacting with RStudio through RStudio Cloud (https://rstudio.cloud/). You will join the Stat 313 workspace, and then be able to access the course homework and lab assignments. We will be walking through this in the first week of lab!

I strongly advise you to pay for the $5 per month plan with RStudio Cloud. The free plan only gives you 25 hours of working on projects a month, and I don’t want anyone to run out of time and not be able to complete their assignment!


For questions of general interest, such as course clarifications or conceptual questions, please use the Class Discord Server. Refer to the Day One Class Setup materials for more information on how to effectively use this server.


Welcoming Classroom

I value diversity, inclusion and equity in this (and every) class. I hold the fundamental belief that everyone is fully capable of learning and doing statistics. There is more than one way to address a statistical problem, and our learning will be richer by being open to different ideas, rejecting stereotypes, and being aware of—in order to minimize—our biases. I look forward to getting to know you all as individuals and as a learning community.


Course Description and Learning Objectives

Catalog Description: Applications of statistics for students not majoring in statistics or mathematics. Analysis of variance including one-way classification, randomized blocks, and factorial designs; multiple regression, model diagnostics, and model comparison. Prerequisite: Stat 217, Stat 218, Stat 221 or Stat 312

Expected Outcomes:

  • Students will be able to identify, describe, and contrast different study designs:

    • observational studies
    • one-factor completely randomized designs
    • randomized complete block designs
  • Analyze and interpret the results of analysis of variance (ANOVA) from different study designs, including multiple comparisons of means

  • Analyze and interpret the results of simple linear regression models, and multiple linear regression models using both quantitative and categorical predictor variables.

  • Understand and know how to use variable selection techniques in regression.

  • Describe the assumptions of regression models and evaluate whether they are violated for a particular analysis.

  • Describe Type I and Type II errors in the context of a study, and critically evaluate how different aspects of a study design will affect these errors

  • Identify potential ethical issues surrounding data collection and analysis

  • Create thoughtful single variable, two variable, and multivariable data visualizations to aid in the communication of statistical findings

  • Carry out data wrangling tasks required to prepare data for analysis, including selecting variables, filtering observations, and mutating variables

  • Understand and describe the importance of reproducibility in scientific research, and collaborate in groups to create a reproducible statistical report

COVID-19 Addendum

I understand and appreciate the strange world we find ourselves living in today. Surrounded by a variety of factors, many of us find ourselves overwhelmed, frustrated, confused, and sad. I appreciate all of these outside forces you are navigating, and will do my best to monitor the pace of our course so that no one feels left behind. Yes, this may mean that we may not hit every learning outcome. Instead, it is of utmost importance to me that you leave this course with an appreciation for the complexities involved in building a statistical model, grasp of the importance of good experimental design, and that you participate in group conversations about statistical reasoning and problem solving.


Course Organization

This class is organized into six units. The skills learned at the beginning of the course will carry throughout the remainder of the course. I hope that you are able to see the connections between the topics of the different units, since they are all part of one big family—the regression family!

Unit 1: Foundations of Statistics (Week 1)

This introductory unit has three big tasks, (1) review statistical and data oriented concepts you have (likely) seen before, (2) think critically about why statistics is used in science, and (3) think about how (historically) statistics has been used for inference.

Reading: Chapters 1 and 2 in Introduction to Modern Statistics (IMS), with supplementary articles

Unit 2: Exploratory Data Analysis (Weeks 2 & 3)

This unit focuses on building skills for working with and visualizing different types of data. First, we will focus on categorical data–creating summary tables and barcharts. Next, we will turn our attention to numerical data–calculating summary statistics, histograms, scatterplots, and linegraphs.

This unit will pair:

  • Chapter 5 in IMS with Chapter 2 (sections 1-7) in Modern Dive
  • Chapter 4 in IMS with Chapter 2 (section 8) in Modern Dive

Unit 3: Regression Modeling (Weeks 5 & 6)

In this unit we finally begin exploring statistical methods. You will put the tools you learned for wrangling and visualizing to work in the context of linear regression. We will start in a (likely) familiar context–linear regression. Once we’ve explored the concepts of “simple” / basic regression we will turn up the heat and add some additional explanatory variables with multiple linear regression.

This unit will pair:

  • Chapter 7 in IMS with Chapter 5 in Modern Dive
  • Chapter 8 in IMS with Chapter 6 in Modern Dive

Unit 4: Foundations of Statistical Inference (Weeks 7 & 8)

This unit will start with Chapter 7 in Modern Dive, setting the stage for why Statisticians care so much about variability. We will then use these ideas to walk through chapters 11 through 13 in Introduction to Modern Statistics, exploring different methods for summarizing the variability we might expect to see in different samples.

We will use these avenues to explore concepts you have seen before: hypothesis tests and confidence intervals. However, the main focus of these concepts will be on the idea of sampling variability not significance testing. We will visit the ideas of p-values and significance testing, with a emphasis on making (and justifying) sound scientific decisions.

Unit 5: Inference for Regression (Week 9)

Now that we’ve discussed the ideas behind sampling variability and statistical inference, we’ll explore these ideas in the context of linear regression. We will explore simple linear regression first, looking at how we can assess how “good” of a job our explanatory variable does in explaining the response variable. Next we’ll discuss how we can extend these ideas to a regression with multiple explanatory variables–using model selection criteria.

This unit will explore:

  • Chapter 24 in IMS with Chapter 10 in Modern Dive
  • Chapter 25 in IMS

Unit 6: ANOVA a (Boring) Case of Regression (Week 10)

To wrap up the quarter, we will look at a special case of linear regression–ANOVA. In this special case, our regression will include only categorical variables as explanatory variables. We will first review how we compare the means of two groups and then connect with what we learned about categorical variables in multiple linear to conceptualize how we can compare the means of three or more groups.

This unit will explore Chapter 21 and 22 in IMS


Course Components

Canvas will be your resource for the course materials necessary for each week. There will be a published “coursework” page which you can access through RStudio Connect. The page will walk you through what you are expected to do each week, including:

  • textbook reading
  • lecture videos
  • homework questions
  • quiz questions

Discord will be your resource for course discussions and questions.

RStudio Cloud will be your resource for course lab assignments

  • data sets
  • lab assignments
  • group projects
  • software resources

Zoom will be used for the midterm and final oral exams.

Communication

Every Sunday evening there will be an announcement on Canvas letting you know what is due over the next week, and the material we will be covering. The module for each week will be released on Sunday evening, so you can look over the content and see what the plan is for the week.

We will use Discord to manage questions and responses regarding course content. There are channels for the different components of each week (e.g., Week 1 Lab Assignment). Please do not send an email about homework questions or questions about the course material. It is incredibly helpful for others in the course to see the questions you have and the responses to those questions. I will try to answer any questions posted to Discord within 3-4 hours (unless it is posted at midnight). If you think you can answer another student’s question, please respond!

Generally, I will work either Saturday or Sunday each weekend, and will work from approximately 7am to 5pm during the week. I will attempt to respond to emails in 24 hours, but emails sent on a Saturday night may not be responded to until Monday morning. If you don’t hear back from me in 48 hours, assume I did not receive your email and resend it!


Weekly Schedule

Each week in STAT 313 will, for the most part, look something like this:

Individual Expectations

Group Expectations

Individual Assignments

Readings and Videos

I favor a “flipped classroom,” since I believe it give you more hands on experiences working through the concepts we are learning. Thus, you should expect to dedicate time early in the week to reading the course material and watching the necessary videos.

Weekly Quizzes (Due Fridays at midnight)

Each week there will be a short (~10 questions) quiz over the reading and videos from the week. These quizzes are intended to ensure that you grasped the key concepts from the week’s readings. The quizzes are not timed, so you can feel free to check your answers with the textbook and/or videos if you so wish. You can attempt the quizzes three times before the deadline!

Think Out Loud Recording (Due Sundays at midnight)

Every week I will post one “big picture” conceptual question from the content covered the prior week, which you are required to record your response to the prompt. You should think of this “think out loud” as the second check point for the content covered each week.

Each recording will be done through FlipGrid. The recordings will have an associated Canvas assignment (posted in the week’s module), which will direct you to the FlipGrid application. You will need to use your computer audio to record, and I would prefer that you use your video camera during your recording.

Scores for this portion of the course will be allocated similar to that of homework submissions, where full credit will be given so long as you provide a response that demonstrates you have spent time thinking about the content. I will provide individualized feedback to your recording, letting you know where you are on the right track and what areas you could review.

I would like for you to think about these recordings as preparation for the course’s oral exams. During these exams, you will be given 10-minutes to reason through two conceptual questions, similar to the knowledge questions previously covered. Give these knowledge checks some honest effort! They help you study for the midterm and final exams, and the feedback I give you can build your understanding before exam week.

While grading I will look for explanations that provide great explanations and contact the students who provided these explanations. You will be asked if you are comfortable with me sharing your video with the rest of the course. You are permitted to say either “yes” or “no,” as your grade is not dependent on this request.

Tutorials

On the lab assignment weeks, I will assign a set of tutorials focusing on specific skills in R. These tutorials provide a review of the concepts covered in the textbook, give examples of how to work with data in R, and have hand-on exercises where you will need to write the R code necessary to complete a given task (with hints provided).

The tutorials are work at your own pace, so you can complete them all at once or slowly throughout the week. The lab assignments will require for you to put the skills you learned in the tutorials to work, so completing the tutorial is necessary to complete the R code in each lab assignment.

Team-based Assignments

Group Discussions / Conceptual Homework (Due Wednesdays at 8pm)

You will be expected to regularly participate in groups in class, discussing and working through questions associated with the content for the week. Your group will be expected to submit a document with your responses to each question given that week.

Homework assignments will be given full credit, so long as each question was “reasonably” attempted. A “reasonable” attempt requires that the responses to each question (1) reflect critical thinking about the central concept(s) of the question, and (2) provide a unique insight into the group’s thinking.

At the end of each week, I will provide every group with feedback regarding their ideas, and I will provide a “summary” of ideas that stood out and key misconceptions to the entire course. You should think of these collaborative questions as your first “check point” in the week’s content. You will get feedback as you progress through the check points, so you have a better idea of what you need to work on as you prepare for the exams

Lab Assignments (Due Sundays at midnight)

Labs will be assigned every other week, providing the opportunity to explore the course concepts in the context of real data. Lab assignments will require for you to work through the tutorial for the week, thus the tutorials should be started early!

You will complete the lab assignments in the same teams you collaborate with in class. You will access the lab assignment through RStudio Cloud, which you will be walked through during the first lab. Your group will be expected to submit your completed lab on Canvas. You will need to submit both the HTML and RMarkdown documents.

Lab Assignment Grading

I expect that you will approach each lab assignment seriously, investing the necessary time and energy to prepare your responses. Different from what you may have experienced, lab assignments are graded for “mastery” of the concepts. The degree to which you “mastered” each question is assessed with the following four-point scale.

Score Justification
4 Master – Outstanding work that exhibits comprehensive and thoughtful understanding of the content of the question, with an individualized perspective. Errors do not occur in the assignment.
3 Practitioner – Work reflects a solid understanding of the content of the question. Work may not exhibit original perspectives. Several errors may occur in the question.
R Redo (Required) – Work satisfies the requirements of the assignment while missing the spirit of the assignment. May be an incomplete rendering of the assignment. The work may contain inconsistencies or demonstrate limited understanding of the content. May be incoherent or poorly written.
1 Novice – The work shows a lack of understanding of the assignment.
0 No credit – The problem was not seriously attempted.

For questions that your team reasonably attempted but did not earn a “Practitioner” score, your team will be requested to revise and resubmit your response. Problems marked with an “R” have the opportunity to be redone and returned to earn “Practitioner” credit. Although you may have as many opportunities as needed to correctly redo problems, your first attempt at redos must be completed and turned in within two days (48 hours) after receiving your lab assignment.

For each problem marked with an “R”, you will be required to submit a redo form in order to earn back a “Practitioner” score (3 points) on the problem. One redo form must be completed for each problem. You will submit your revisions to the same assignment portal, along with a redo form for every question.

Each homework assignment will receive a score that reflects the average points-per-question. For example, suppose your homework assignment has four questions and you receive the following scores on each problem: 3, 4, 3, and 4. Then, your overall score is \(\frac{3 + 4 + 3 + 4}{4} = 3.5\). Suppose you received the following scores (3, 4, 3, and R). Then, your initial overall score is \(\frac{3 + 4 + 3 + 0}{4} = 2.5\) (Redo 1).

Do not interpret 3/4 as equivalent to a grade of 75%. This rubric is intended to be about the quality of your work, not the percentage of points earned. It is likely that students consistently scoring “Practitioner” on this rubric will earn at least a B- in the course.

Letter grades will be assigned according to the following minimums: 3.75 (A), 3.0 (B-), 2.85 (C), below 2.85 (C-, D, or F).

Note: If you have more than three “0” grades due to turning assignments in late, un-revised “Rs,” failure to participate in group discussions, or missed assignments, the highest grade you will earn for the class is a C-.

Projects

There will be two projects throughout the quarter, where you will be asked to apply the statistical concepts you have learned in the context of real data. Each of these projects will be done in the teams you have been working with in class. More details will be provided during class.

(Oral) Exams

There will be two assessments designed for students to demonstrate their understanding of content covered during the course. Each unit assessment will be held virtually as an individual oral exam. These exams will require for you to talk through a set of open-response questions, demonstrating your understanding of the concepts. Each exam will follow the submission of the group projects, with the first occurring approximately halfway through the course and the second taking place during finals week.


Working in Teams

Team Member Roles

Your team will be rotating group roles each week, so that one person does not act as the “team manager” for more than one week. Instead the following roles will circulate each week, so that each member of the group is able to complete each role.

Role Responsibilities
Manager Responsible for organizing the team work: making sure all roles were assigned and clear, scheduling meetings, and leading discussion of lab assignment problems. During the group discussions the editor is responsible for making sure everyone has a chance to contribute, asking quiet team members to speak up, asking loud team members to listen to others, and bringing the conversation back to the lab assignment if it deviates.
Recorder Responsible for collecting, organizing, and recording answers to the assignment during the discussions, compiling the summary of the answers discussed, sending summary to editor.
Editor Responsible for reviewing the draft summary provided by the case reporter, sharing the summary with the team, soliciting feedback from the team, and submitting the final assignment by the deadline.
Clarifier During the team meeting the clarifier should assist the group by paraphrasing the ideas presented by other group members, e.g. “Let me make sure I understand…”. The clarifier is responsible for making sure that everyone in the group understands the solutions to the problems.

There will be confidential peer evaluations completed every two weeks. I will use these to check-in on each group’s dynamics and ensure that everyone feels their voice is being heard.

Team Meetings

There will be time in each class for your team to work on the assignment for the week, however this time will not be sufficient to complete the assignments. Therefore, every group is expected to meet for at least 2-hours outside of class.

If you are not in attendance for more than one of your team meetings that week, you will be expected to complete that week’s assignment on your own.

My hope is that each member of the group looks over the week’s assignment on Monday while reading the week’s chapter(s). Then, each member should have some initial ideas to propose by Tuesday’s team meeting. During this meeting the reporter can begin the process of writing up the group’s ideas. The reporter will then provide the editor with their summary of the group’s ideas. As a team, you can choose to work together or independently on the assignment’s problems between course meetings. The manager is responsible for scheduling additional meetings, so that the editor is able to submit the assignment on time.


Grade Breakdown

Your grade in STAT 313 will contain the following components.

Note: If you have more than three “0” grades due to turning assignments in late, un-revised “Rs,” failure to participate in group collaborations, or missed assignments, the highest grade you will earn for the class is a C-.


Other Policies

Late Work Policy

Assignments and redos are expected to be submitted on time. However, every student will be permitted to submit one individual assignment 24 hours late without question. You do not need to contact me to use this allowance, but if you find yourself in a position where you have used this allowance and you cannot complete another assignment by the due date, you are expected to email me. Once you email me, we can work together to find a deadline that is fair to both you and other students. If I do not hear from you, I will take a 5% reduction in score for every day an assignment is late, up to four days

You are also expected to participate in the week’s team collaborations in a timely manner. I will be able to monitor your involvement through your course attendance. If you find that you are unable to participate in the week’s team collaborations in a timely manner, please let your team know and contact me. Once you contact me, we can create a timeline that will both allow for you to engage in the assignment and allow for members of your team to receive feedback from you.

Behavioral Expectations

This course will use group work and classroom activities to better engage your understanding of the material. The environment for these discussions and activities is expected to adhere to ground rules, so that we create an open space where everyone feels welcome and supported, and comfortable with voicing their opinion.

By participating in this community, students accept to abide by the classroom Code of Conduct and accept the procedures by which any Code of Conduct incidents are resolved. Any form of behavior to exclude, intimidate, or cause discomfort is a violation of the Code of Conduct. In order to foster a positive and professional learning environment we encourage the following kinds of behaviors in all platforms and events:

  • use welcoming and inclusive language
  • everyone participates and no one dominates
  • listen to understand
  • share airtime
  • one speaker at a time
  • disagree with respect
  • be respectful of different viewpoints and experiences
  • all ideas are valid
  • gracefully accept constructive criticism
  • treat everything you hear as an opportunity to learn and grow
  • seek common ground and understanding

Cheating and Plagiarism.

Simply put, I will not tolerate cheating or plagiarism.

Any incident of dishonesty, copying, exam cheating, or plagiarism will be reported to the Office of Student Rights and Responsibilities.

Cheating will earn you a grade of 0 on the assignment and an overall grade penalty of at least 10%. In circumstances of flagrant cheating, you may be given a grade of F in the course.

Paraphrasing or quoting another’s work without citing the source is a form of academic misconduct. This included the R code produced by someone else! Writing code is like writing a paper, it is obvious if you copied-and-pasted a sentence from someone else into your paper because the way each person writes is different.

Even inadvertent or unintentional misuse or appropriation of another’s work (such as relying heavily on source material that is not expressly acknowledged) is considered plagiarism. If you are struggling with writing the R code for an assignment, please reach out to me. I would prefer that I get to help you rather than you spending hours Googling things and get nowhere!

If you have any questions about using and citing sources, you are expected to ask for clarification.

For more information about what constitutes cheating and plagiarism, please see https://academicprograms.calpoly.edu/content/academicpolicies/Cheating.